On the Communication Complexity of Distributed Clustering

نویسنده

  • Qin Zhang
چکیده

In this paper we give a first set of communication lower bounds for distributed clustering problems, in particular, for k-center, k-median and k-means. When the input is distributed across a large number of machines and the number of clusters k is small, our lower bounds match the current best upper bounds up to a logarithmic factor. We have designed a new composition framework in our proofs for multiparty number-in-hand communication complexity which may be of independent interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Distributed Clustering on Graphs

This paper provides new algorithms for distributed clustering for two popular center-based objec-tives, k-median and k-means. These algorithms have provable guarantees and improve communicationcomplexity over existing approaches. Following a classic approach in clustering by [13], we reduce theproblem of finding a clustering with low cost to the problem of finding a ‘coreset’ of...

متن کامل

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...

متن کامل

K-Means Clustering with Distributed Dimensions

Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorith...

متن کامل

ON FUZZY NEIGHBORHOOD BASED CLUSTERING ALGORITHM WITH LOW COMPLEXITY

The main purpose of this paper is to achieve improvement in thespeed of Fuzzy Joint Points (FJP) algorithm. Since FJP approach is a basisfor fuzzy neighborhood based clustering algorithms such as Noise-Robust FJP(NRFJP) and Fuzzy Neighborhood DBSCAN (FN-DBSCAN), improving FJPalgorithm would an important achievement in terms of these FJP-based meth-ods. Although FJP has many advantages such as r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1507.00026  شماره 

صفحات  -

تاریخ انتشار 2015